A 


12 

Th«  iMxt  •tag*  Mta  to  hold  off  th«  volco  ocho  until  th«  OVI  activate  evitch  was 
relaaaed.  Tha  echo  waa  heard  within  half  a  second  of  release  of  the  switch.  This  was  a 
■uch  eore  satisfactory  asthod  of  iapleaentatioo  which  aes  such  preferred  by  pilots.  In 
addition,  the  list  of  words  which  were  echoed  was  increased  to  include  tha  keywords  such 


ROYAL  AIRCRAFT  ESTABLISHMENT 
Technical  Memorandum  FS(B)  637 
Received  for  printing  22  April  1986 


A  FLIGHT  EVALUATION  OF  VOICE  INTERACTION  AS  A  COMPONENT 
OF  AN  INTEGRATED  HELICOPTER  AVIONICS  SYSTEM 


by 

R.  Little 

Lt  Cdr  R,  Cowan  RN 


SUMMARY 


DTIC 


A  Wessex  helicopter  at  RAE  Bedford  was  used  to  develop  and  evaluate  an 
integrated  avionics  system  which  incorporated  advanced  displays  and  a  flight 
management  system  for  both  military  and  civil  applications.  Two  important 
features  of  the  system  were  automatic  speech  recognition  and  synthetic  speech 
output.  Flight  trials  have  been  conducted  to  establish  guidelines  for  the 
successful  integration  of  these  devices  with  advanced  avionics  such  as  colour 
displays,  digital  maps  and  touch  overlays.  The  use  of  speech  technology  in  the 
cockpit  offers  an  element  of  redundancy  and  if  correctly  integrated  will  be 
capable  of  ioiproving  the  man  machine  interface  to  a  far  greater  degree  than  is 
achievable  by  hand  or  voice  alone.  The  trial  has  shown  that  data  input  and 
retrieval  from  such  a  well  structured  cockpit  management  system  can  be  achieved 
quickly,  sioiply  and  easily. 


Copyright 

e 

Controller  HMSO  London 
1986 


pignUBUTIC^  STATEMENT  A  | 


V.V  J  1-  i  t  ’ «  >  -  ^ 


Approvd  for  public 
Dlitilbution  Unllmlfd 


2 


LIST  OF  COMTEMTS 


1  UTSODOCTtOM  3 

2  THE  SPEECH  TECHMOLOCT  EQOIPMEIIT  3 

2.1  Tba  autoaatle  •p««ch  raeognlaar  3 

2.2  Th*  ■ynthattc  apaach  ayatea  4 

3  THE  TEULS  PKOGSAMa  4 

3.1  Spaaeh  racognltlon  4 

3.2  Spaaeh  oucpue  3 

4  SPEECH  BBCOGHmOH  IBSULTS  S 

4.1  Syataa  rallablllty  and  packaging  3 

4.2  Racognltlon  parfonanca  6 

4.2.1  Vocabulary  6 

4.2.2  Aeelva  worda/ayntax  6 

4.2.3  Slgnal/nolaa  aflacea  7 

4.2.4  Pilot  variability  8 

4.2.5  Maaaurad  raaulta  8 

4.3  Piloting  aapacta  11 

4.3.1  Spaaeh  quality  11 

4.3.2  Vlaual  enaa  11 

4.3.3  Volca  acho  11 

4.3.4  Oaaa  12 

4.4  Intagratlon  aapacta  13 

5  SPEECH  OOTPOT  RESULTS  14 

5.1  Spaaeh  Intalllglblllty  15 

5.2  Spaaeh  output  control  15 

5.3  Oparatlooal  affactlvanaaa  16 

5.3.1  Priority  Inforaatlon  16 

5.3.2  Optional  Inforaatlon  16 

5.4  Volca  Intaraetlon  17 

6  COHCLUSIOHS  18 

Appandix  A  Tha  Uaaaax  facility  21 

Appandix  B  Spaaeh  recognition  and  Ita  laplaaantatlon  In  tha  Haaaax  23 

Appendix  C  Trlala  prograaaa  and  proceduraa  26 

Table  1  Speech  racognltlon  vocabulary  28 

Table  2  Speech  racogolelon  ayntax  29 

Table  3  Total  apeeeh  output  vocabulary  30 

Table  4  Priority  apaech  output  vocabulary  30 

Tabla  5  Optional  apaech  output  vocabulary  31 

Rafarenca  32 


Illuatratlona 

Report  docuaatitatlon  page 


Figuraa  l-tO 
Inalde  back  eovar 


/ro  W7 


TM  FS(B)  637 


3 


I  INTRODOCIIOS 

6  Uuscx  2  hxllcoptfir  AC  RAE  Bxdford  uis  uAAd  Co  ovaIuaco  a  suIca  of  AXporloienCAl 
ATloalcA  In  fllghc.  The  now  oqulpaonc  eoaprltod  a  sac  of  colour  CRT  dltplAps  (one  wlch 
A  Couch  sensing  overlsy).  sucouctc  speech  recognlclon  (ASR),  syncheclc  speech  ouCput 
(SSO),  A  aonochroae  conCcol  end  dtsplAy  unlc  (CDD)  (Also  wich  A  Couch  sensing  overlsy), 
■Icroprocessors  for  Incerfece  end  elgorlchas  relACed  Co  engines,  crensslsslon,  radios, 
guldsnce,  AlrdeCA  end  nsvlgeclon  syscsu  (Including  e  dlglcel  rap).  The  cocel  cockple 
•AS  nlghc  vision  goggle  (HVG)  coapeclble.  The  squlpranc  end  sofcuere  rare  Incegreced  co 
provide  e  user  friendly  sdvenced  eocfcplc  svlonlcs  end  fllghc  ranegemenc  syscee  which 
could  fora  Che  core  of  sny  fuCure  hellcopCer  cockple  psekege.  A  deceiled  descripcion  of 
Che  Hesses  feclllCy  Is  given  In  Appendix  A  end  Ref  1. 

The  purpose  of  che  developaenc  progrsras  ras  two-fold: 

(a)  To  decernlne  how  che  CoCel  plloc  cockple  dlspley  Incerfeces  end  supporclng 
Avionics  could  besc  be  InCegreced  end  fllghc  ranegerant  decs  prsssncsd  to  s  plloc,  to 
opClnlse  his  perforrance,  reduce  workloed  end  hence  Increese  the  ovcrsll  alsslon  effec¬ 
tiveness. 


(b)  To  denonstrsee  a  represenCAtlve  FLYING  systen  to  Service  end  Civil  operecors,  end 
DK  Industry  end  to  stlaulece  their  epprecletlon  of  the  benefits  Chet  new  technology 
offers  when  used  Co  Advencege. 

This  Hcaorendua,  which  Is  one  In  A  secies  concerned  with  the  sub-systsns  In  the 
Wessex,  describes  che  perforrance  end  opecAClonel  use  of  the  speech  recognition  end 
synthesis  systen  In  che  cockpit.  These  recent  InnoveClons  hsvs  Atcrsetsd  rach  Interest, 
In  thee  It  ras  hoped  thet  their  use  would  Allevlste  son  of  the  high  pilot  workloed 
AssoclAted  with  the  elrborne  raongennt  of  e  sophlstlceCed  suite  of  svlonlcs.  The 
reesonlng  ras  thet  the  use  of  speech  technology,  would  enebls  the  plloc  to  cooesntrscs 
on  flying,  with  eyes  wt  of  the  cockpit.  This  would  enhencs  flight  sefecy. 


During  the  crisis  the  rain  thrust  of  the  work  ras  towsrds  the  locegreclon  of  the 
new  technologies,  to  discover  when  end  how  they  should  be  used  In  conblnetlon  with  other 
technologies  In  the  cockpit.  Although  the  Indlvlduel  perforrance  of  the  speech 
rseognlser  snd  synthesiser  ras  nsssured,  this  ras  of  secondery  laportsncs  for  two 
rsASons.  Firstly,  the  equlprant  used  did  not  represent  che  stenderd  which  could  be 
expected  co  be  In  Service  In  two  or  three  yeses  tlra.  Secondly,  crltsrle  for  sensible 
perforrance  asAsurenne  of  speech  recognition  end  synthesis  sysceas  hsvs  yet  to  be 
defined. 

2  THE  SPEECH  TECHNOLOGY  EOOIFMEOT 

2.1  The  Autoratlc  speech  recoenlser 


The  device  used  In  Che  Hsssex  ras  the  SR-128  speeker  dependent  word  rseognlser.  ^ 

It  could  Accept  phrsees  up  to  8  s  In  dursclon  end  hsd  sufficient  raaory  to  be  Able  co 
recognise  up  to  128  s  of  speech  uccersnees.  When  using  the  systen  the  speeker  hsd  to  / 
provide  An  Isolsted  uctersnee  ob  sp^  worA  In  the  voeeJnilAry  co  be  eaployed.  The  plloc.  Codes 
Inltleted  a  coarand  to  che  rscogWs4r'%^~ratngt  tn.  A^tllU^s  switch  on  the  cyclic  flying  id/er 


3l8t 


Spselnl 


4 


control..  K  full  doicclpclon  of  tb*  SK-126  and  Itt  Installation  -  the  Uessex  la  given  In 
Appendix  B.  Table  1  llsta  the  vocabulary  used  In  the  Weaaex. 

2.2  The  aynthetlc  speech  systea 

A  synthetic  speech  facility  uhlch  could  be  used  to  give  audio  laessages  or  warnings 
to  the  pilot  Mas  designed  and  installed  In  the  processing  and  Interface  unit.  This  was 
based  on  digitally  recorded  speech  using  a  Texas  Instruaents  5220  linear  predictive 
coding  speech  processor,  «Alch  could  score  up  to  500  words  of  vocabulary.  Slaple 
cowaanda  froa  the  aaln  processor  could  then  select  a  word  or  group  of  words,  Its  voluae, 
and  Its  priority  for  output  If  sore  than  one  word  or  phrase  was  called.  Any  vocabulary 
could  be  prograaaed  by  any  voice,  ualng  a  coaplaaantary  portable  speech  laboratory 
specifically  produced  by  Texas  Instruaents  for  that  task.  This  facility  enabled  a  user 
to  prograa  any  words  or  phrases  Into  prograaaable  read  only  neaory  (FROM)  which  was  then 
Inserted  Into  the  speech  Interface  card.  The  voice  output  was  fed  into  the  aircraft 
Intarcoa  aystea  via  an  onboard  aapllfler.  The  synthetic  speech  could  be  totally 
Isolated  froa  the  Intercoa  by  deselecting  It  on  the  Intereoa/ radio  acatlon  box. 

Three  groups  of  vocabulary  ware  distinguished  by  the  aaln  processor  systea.  The 
first  corresponded  to  warnings  such  as  engine  failure.  The  second  corresponded  to 
optional  Infomstlon  such  as  height  calls  used  for  the  closing  stages  of  instmaent 
approaches.  This  level  could  be  inhibited  via  the  CDU,  to  ellalnate  nuisance  calls  when 
hover  taxylng  or  flying  nap  of  the  earth. 

The  third  group  corresponded  to  a  voice  echo  of  the  words  recognised  by  the  SR-128. 

Table  5  shows  the  vocabulary  that  was  stored  on  the  speech  card. 

3  THE  TRIALS  PROCRAWME 

The  aan-aachlne  Interface  of  the  Uessex  cockpit  was  considered  pertlcularly 
laportant*  The  speech  rwcognlser  and  synthetic  speech  output  systeas  were  on  Integral 
part  of  this  interface.  Speech  recognition  could  be  used  in  parallel  with  tactile 
controls  such  as  keys  and  couch  sensing  overleys.  Speech  output  could  be  used  In 

parallel  with  the  aonochrooe  and  colour  displays.  A  description  of  Che  overall  Wessex 

trials  prograaos  Is  given  In  Appendix  C. 

3.1  Speech  recognition 

To  study  the  perforasnee  of  the  DTI  systea  In  flight  nany  sorties  were  devoted 
entirely  to  Its  use.  In  this  way  not  only  was  Che  speech  recognlser  perforasnee 
aeasured  but  else  Che  probleas  and  attributes  of  using  DVI  during  all  oodes  of  flight 
for  a  wide  vsrlscy  of  tasks  were  assessed.  On  other  occasions,  pilots  were  briefed  to 
use  the  DVI  only  when  they  wanted.  They  were  free  to  use  ocher  eechods  of  control  such 
as  switches  or  Couch  overlays  at  the  saoe  clae.  In  this  wa«  It  was  hoped  to  discover 
the  relative  asrlcs  of  dlfferenc  control  systeas  and  where  each  was  preferred. 

The  speech  recognlser  could  control  Che  following  functions 

(a)  Multi  function  display  foraats 


-  call  digital  sap  with  tactical  overlay 

-  altar  eap  scale 


in  FS<B)  617 


TM  FS(B)  637 


5 


-  call  anginas  display 

-  call  hovar  display 

-  call  chaek  lists  ^  taka  off,  shutdown  etc 

-  call  waypoint  data  such  as  cangaa,  bearings,  ETA'c 

~  call  flight  status  data  such  as  endurance,  all  up  weight 

-  call  radio  cow  plana 

-  call  llalts  data  icing) 

(b)  Guidance 

-  select  ILS  or  MLS  sensors 

-  Input  runway  haadlngs 

-  Input  glldeslope  for  MLS 

-  Input  cruise  spaed  and  height  for  the  flight  directors 

-  Input  decision  height 

-  select  go-around 

(c)  C  owaunlcatlona 

-  select  relevant  radio 

-  select  e  preset  frequency 

-  store  a  preset  frequency 

-  Input  a  radio  frequency 

-  select  a  frequency  by  title  (eg  'Bedford  Approach') 

(d)  Savlgatlon 

■■  fix  to  a  planned  waypoint 

-  select  a  reporting  base 

-  select  a  waypoint  to  put  steering  data  on  the  PFD 

~  sat  and  allocate  a  waypoint  nuaber  to  a  chosen  point  on  the  nav  plot 
' u  pdata  waypoint  grid  locations 

-  select  or  altar  a  route 

3*2  Speech  output 

The  speech  output  was  used  on  all  flights.  The  vocabulary  was  split  Into  priority 
warnings  and  advisory  inforaation  as  already  described.  Tablas  A  and  5  show  the  two 
groups  of  phrases  and  the  events  which  the  prograa  used  to  proapt  the  speech  output. 

In  addition  the  aystea  was  used  to  echo  soaa  of  the  DVl  coaaanda  to  give  the  pilot 
audio  feedback  on  the  word  or  phrase  recognition  that  had  taken  place. 

A  SPEECH  BECOCMITIOW  RESULTS 

The  evaluation  of  ASK  can  ba  split  into  savaral  araas  corresponding  to  systea 
reliability,  recognition  parforaanca,  pilot  use  and  acceptance,  and  Integration  aspects. 

A. I  Systea  reliability  and  packaging 

The  SR-128  equipasne  was  powered  by  ZAO  v/30  Hs  supplias  and  was  designed  for 
ground  based  applications.  Luckily,  it  fitted  Into  19  inch  racking  and  the  internal 
power  supplies  could  also  ba  driven  froa  the  US  v/AOO  Hx  aircraft  systeas.  The  Wessex 


6 


1 

\ 


ms  slread;  fitted  with  19  loch  racks  so  the  Installation  In  the  aircraft  was  straight¬ 
forward.  In  Che  subsequent  flight  trials  the  equlpocnc  has  proved  to  be  very  reliable 
falling  only  once  In  over  ISO  hours  of  test  flying. 

No  Interference  effects  were  detected  between  the  SR-128  and  ocher  equipment. 

It  la  thought  that  with  Che  present  advances  In  VLSI  technology  and  suitable 
pomr  supplies  the  ASR  aqulpmnc  of  Che  near  future  will  be  supported  on  a  single  card 
system  which  could  Chen  be  an  Interface  card  In  a  processing  snd  Interface  unit .  This 
offers  obvious  benefits  In  system  Integration. 

4.2  Recognition  performance 

The  recognition  performance  of  an  ASR  system  has  bean  found  to  be  dependent  on 
many  factors.  The  figure  of  meric  which  la  attached  to  this  type  of  equipment  Is  conse¬ 
quently  without  meaning  unless  other  parameters  concerning  the  ASR  and  the  experiment 
which  led  to  Che  results  are  known. 


4.2.1  Vocabular 


The  concent  of  Che  vocabulary  could  have  a  graat  effect  on  the  recognition  perform¬ 
ance.  For  example,  moat  recognlsers  would  perform  well  when  distinguishing  between  Che 
words  'speech*  and  'voice',  however  the  same  recognlser  would  probably  make  more  errors 
If  Che  words  were  'speech'  and  'teach'. 


This  problem  arose  on  many  occasions  during  the  Initial  DVI  experiments  In  Che 
Wessex  helicopter.  A  good  example  occurred  during  navigational  fixes  over  pre-designated 
waypoints.  Onfortunacely,  the  words  'fix'  and  'six'  were  often  confused  resulting  In 
poor  recognition  performance  and  pilot  acceptability.  To  overcome  the  problem  Che  word 
'plot'  sms  adopted  In  place  of  'fix'.  Fllots  quickly  became  accustomed  Co  Che  word  and 
recognition  performance  Increased  considerably.  Similarly  'ok'  was  preferred  to  'enter' 
as  the  executive  coasund.  The  vocabulary  Is  shown  In  Table  1. 


4.2.2  Active  words/syncax 


As  Che  number  of  words  which  the  ASR  system  was  required  to  recognise  Ircreased 
the  recognition  performance  degraded.  Although,  Ideally,  all  the  words  In  the  vocabulary 
would  be  active  at  all  times  It  was  found  that  syntax  was  requ.lred  to  reduce  Che  number 
of  active  words  at  any  one  time  and  thus  preserve  recognition  accuracy  and  Improve  recog¬ 
nition  speed.  However,  syntax  Introduced  Its  own  problems  If  It  was  not  properly 
structured. 


As  with  complex  menu  structures  used  In  modern  Control  and  Display  Units,  If  the 
syntax  was  coo  restrictive  pilots  found  the  syscea  unfrlendl.y.  This  was  particularly 
apparent  when  the  ASR  als-recognlsed  the  pilots  words  snd  diverted  Che  software  to 
another  area  (called  a  node}  of  Che  syntax  tree.  This  could  result  In  the  pilot 
chinking  he  had  manipulated  Che  programme  correctly  when  In  fact  he  had  not. 

A  delicate  balance  between  recognition  performance  and  'user  friendliness' 
resulted  In  s  syntax  structure  which  was  very  similar  to  the  Wessex's  tactile  system, 
based  on  Che  CDU  vnu  and  dedicated  keys.  This  was  not  entirely  surprising  as  great 


/fq  wi 


TM  FS(B)  637 


7 


•ffort  had  been  taken  to  keep  the  CSU  control  'ueer  friendly'.  The  oaxlnua  nuaber  of 
worda  which  were  active  at  one  time  was  thus  reduced  from  66  to  33.  The  najor  advantage 
of  such  a  systen  was  that  the  pilot  used  the  same  syntax  whether  he  used  the  OVI  or  the 
touch  overlay  and  keys.  In  both  cases  the  CDD  display  reacted  In  the  same  way  and  could 
be  used  to  remind  the  pilot  of  the  active  words  available  at  any  tlaie.  This  of  course 
required  the  CDU  to  follow  the  ASR  programme  and  vice-versa.  Thus  consands  started  on 
the  a>U  could  be  finished  by  voice  or  the  ocher  way  round. 

4.2.3  Slgnal/nolse  effects 

One  of  the  greatest  concerns  at  Che  outset  of  the  prograimie  was  Che  capability  of 
Che  ASR  system  to  cope  with  Che  high  background  noise  and  variable  signal  levels  In  Che 
helicopter.  Measurements  during  flight  Indicated  that  the  noise  level  In  the  cockpit 
was  lOS  dBA  which  was  about  20  dB  greater  chan  the  noise  on  Che  flight-deck  of  the 
BAG  1-11  which  had  successfully  flown  the  SR-128  In  earlier  trials. 

To  reduce  Che  effect  of  ambient  noise  the  SR-128  employed  a  noise  mesk  technique, 
see  Appendix  B.  The  technique  urns  to  use  the  SR-128  to  sample  the  ambient  noise,  score 
It  aa  a  mask  and  use  It  to  reduce  the  effect  of  noise  when  the  ASR  was  used.  The 
success  of  this  technique  was  dependent  on  Che  power  spectrum  of  Che  noise  being 
stationary.  Trials  In  a  Buccaneer  at  RAB  Famborough  and  the  BAG  1-11  at  Bedford  had 
experienced  problems  using  the  SR-128  due  Co  Che  dependence  of  Che  noise  on  aircraft 
speed,  power  setting  and  configuration.  To  retain  adequate  performance  repeated  noise 
samples  were  required  as  Che  flight  regime  changed.  This  was  time  consuming  and 
tedious.  Fortunately,  the  problem  was  not  so  severe  In  Che  Wessex,  since  the  principal 
noise  characteristics  were  dominated  by  rotor  and  transmission  systems  which  rotate  at 
governed  rates.  Power  changes  In  helicopter  turbine  engines  do  not  produce  large  varl- 
aclone  In  engine  compressor  uolse  when  compared  to  Che  background  transmission  noise. 

As  a  result  one  sample  of  noise  sufficed  for  most  conditions  during  a  flight.  Having 
said  Chat,  there  were  several  occasions  when  recognition  deteriorated  for  no  apparent 
reason.  When  this  occurred  noise  masks  were  taken  on  the  ground.  In  the  hover  and  In 
forward  flight  and  this  generally  improved  Che  situation  somewhat.  Fig  &  shows  the  mean 
noise  mask  levels  In  each  of  the  ASR  channels  for  different  flight  conditions.  On 
another  occasion  when  recognition  was  particularly  poor,  changing  Che  pilots  microphone 
and  retaining  Che  voice  template  dramatically  Improved  recognition  from  about  50Z  to 
nearly  lOOZ.  This  highlights  a  major  area  of  concern  for  Che  Introduction  of  a  DVl 
system  Into  operational  use.  Existing  equipment  such  aa  microphones  will  have  Co  be 
considered  as  part  of  Che  DVX  and  as  such  may  naed  more  stringent  tests  Chan  are 
currently  used  to  check  for  adequate  Intercom  performar.ee.  It  also  highlights  the  need 
to  ensure  chat  Che  pilot's  microphone  was  positioned  correctly  and,  moreover,  that  It 
had  the  same  characteristics  as  when  It  was  originally  used  to  prepare  Che  voice 
templates . 

Other  factors  which  were  found  to  affect  recognition  were  Che  volume  of  Che  radio 
and  Che  intercom.  Loud  radio  tended  to  make  pilots  speak  more  loudly  and  paradoxically 
loud  Intercom  made  them  apeak  more  softly.  The  effect  of  both  of  these  not  only  varied 


ch«  slgoal  CO  noiso  ratio  but  also  Che  apeech  eharacterlaclca  of  the  pilot.  Both  effects 
reduced  the  reeogalcion  perforaaace. 


, 

' 

i 


As  the  teaplates  hed  been  created  at  a  fairly  coostant  nean  volume  any  variations 
from  this  in  operation  could  degrade  Jte  pattern  netching  technique.  The  SIl-128  appeared 
to  be  particularly  sensitive  to  volume  variations  which  resulted  In  the  pilots  having  to 
talk  to  the  DVl  In  a  deliberate  unnatural  manner  In  order  to  obtain  adequate  recognition 
performance. 

To  overcome  these  effects  future  ASR  systems  must  employ  active  noise  compensation 
techniques.  Furthermore  a  system  which  employed  automatic  gain  control  (AGC)  of  the 
microphone  level  would  reduce  the  variability  of  the  slgnal/nolse  ratio,  although  the 
pilots  speech  charactarlatlcs  would  still  change  with  varying  loudness. 

4.2.4  Pilot  variability 

During  SOM  of  the  RAE  trials  the  word  error  cate  was  found  to  vary  between  pilots. 
The  variations  ware  not  random.  Pilots  who  achlevsd  good  recognition  performance  did 
so  regularly.  Som  pilots  however,  produced  Inconsistent  results  which  were  Invariably 
worse  chan  Che  results  obtained  from  Che  'consistent*  pilots.  The  variation  was  as  much 
as  Sil  In  error  rate  from  typically  2-lOX.  A  technique  which  has  bean  evolved  sc  RSRE 
Malvern  was  used  Co  determine  if  this  problem  was  a  function  of  the  ASR  or  Che  pilots 
cheMslves.  Pilots  were  asked  to  record  a  mimber  of  words  and  phrases  many  times.  By 
correlstlng  utterances  of  the  ssm  word  It  was  demonstrated  chat  som  pilots,  even  when 
crying  to  speak  In  a  repeatable  manner,  produced  a  large  spread  of  acoustic  patterns 
compared  Co  other  pilots.  Consequently,  any  ASR  which  used  acoustic  pattern  matching 
techniques  would  produce  results  which  ware  dependent  on  the  Individual  user.  The 
significance  of  this  problem  has  yet  to  be  established,  particularly  Che  variations  chat 
My  occur  over  a  wider  population  of  pilots  with  different  nationalities  sod  accents. 

Future  ASR  sysceM  will  need  to  cope  with  wltbln-speaker  variability  as  pilot 
selection  on  Che  basis  of  Wl  recognition  perfocMnce  Is  not  likely  to  be  favoured  by 
operators  or  pilots. 

One  possible  solution  being  Investigated  by  a  numbsr  of  apeech  technology  groups 
Involves  Hidden  Harkov  Modelling  techniques  Co  model  speech  variations.  It  Is  subtle 
changes  due  to  stress,  health,  age  etc  chat  would  need  to  be  tackled  by  such  a  system. 

4.2.5  Measured  results 

The  results  presented  In  this  section  relate  to  flights  when  pilots  were  briefed 
to  use  Che  OVI  system  only  when  they  wished  to  do  so  because  It  was  considered  the  most 
attractive  option  under  the  clrcuMtanccs  prevailing  at  the  cIm.  They  were  free  to  use 
ocher  MChods  of  control  such  as  the  keyboard  or  couch  overlays  when  they  felt  It  would 
be  easier.  All  of  the  pilots  used  a  boon  microphone.  It  was  found  Chat  Che  use  of  a 
foam  sock  over  Che  microphone  significantly  reduced  breach  noise.  Although  this  did  not 
appear  to  effect  the  SR-128  recognition  perforMnee,  audio  recordings  were  Mch  clearer. 

MeasureMnCs  from  Initial  pllot/DVI  training  flights  and  flights  during  which 
cqulpMnC  problems  were  uncovered  have  not  been  Included  In  Che  analysis.  The  results 
are  therefore  representative  of  three  trained  pilots  using  a  fully  operational  system. 


L 


m  tS(B)  637 


TM  FS(B)  637 


9 


i 


I 


i 


1 


(»)  Overall  r«cognltlon  perforaanee 

The  total  nunbar  of  uorda  or  phraaes  which  iwre  uaed  la  the  analyals  wee  5290  of 
which  345  were  ala-recognlaed.  Thle  correepoads  Co  a  recognition  cate  of  93. 5Z.  The 
three  plloca  need  on  Che  Weaaex  crlala  recorded  pcrfomancee  to  within  IZ  of  each  ocher. 
It  la  latereaclng  to  compare  this  with  the  results  froa  Che  B4C  1-11  trials  In  which 
conslatsnt  differences  In  perforaance  were  obtained  between  Che  trials  pilots.  The 
slallar  perforasnces  In  Che  Hessex  as;  be  due  Co  Che  high  aablenc  noise  condition  which 
prevailed  In  the  helicopter  which  was  about  20  dB  greater  chan  that  In  Che  B4C  1-11. 

This  high  aablenc  nolsn  any  have  Induced  the  pilot's  to  articulate  aore  precisely  to 
coabat  Che  noise  and  prevencad  chea  speaking  as  naturally  as  they  would  on  the  flight 
deck  of  a  Jet  transport  aircraft.  Thus  the  perforaance  of, the  SK-128  recognlser  In 
relatively  noisy  envlronaencs  nay  be  less  dependent  on  Che  Individual  eharaccerlsclcs  of 
the  pilots. 

(b)  Word  or  phrase  recognition  perforaance 

The  recognition  perforaance  of  Individual  words  or  phrases  Is  shown  In  Figs  5 
and  6.  Recognition  races  for  individual  phrases  vary  becveen  53Z  and  lOOZ.  Although 
detailed  statistics  of  Individual  words  la  the  071  vocabularly  are  unreliable,  given  Che 
saell  nuaber  of  saaples,  aome  laporcanc  features  can  still  be  observed. 

The  phrase  'tees  and  pees'  was  successfully  recognised  on  only  53Z  of  occasions. 

If  chls  word  Is  removed  froa  the  overall  perforaance  analysis  then  Che  recognition  rate 
Increases  froa  93. 5Z  to  94. SZ.  It  Is  Interesting  to  note  chat  chls  word  was  used  to 
call  up  an  engine  aonlcor  foraac  on  the  colour  MFD.  If  the  phrase  was  ais-recognlsed 
Chen  Che  pilot  had  slaply  to  repeat  the  phrase.  Although  pilots  found  such  frequent 
errors  very  annoying  they  considered  the  benefits  of  speech  sufficiently  great  to  per¬ 
sist  with  Its  use. 

The  effect  of  saall  changes  In  the  vocabulary  can  be  seen  by  coaparlng  the  two 
phrases  'asp  up'  and  'asp  down'.  These  two  phrases  were  used  In  Identical  areas  of  the 
syntax  and  were  used  by  Che  pilot  either  to  display  the  navigation  foraac  on  the  colour 
MFD  or  CO  change  the  range  scales  (hence  up  or  down)  If  the  foraac  was  already  displayed. 
However,  a  7Z  difference  In  recognition  rate  for  these  two  phrases  has  been  asasured. 

The  executive  coaaand  'ok'  was  the  word  used  aost  during  the  trials.  It  recorded 
a  recognition  rats  of  over  97Z.  The  executive  word  for  'clear'  which  was  'rubout' 
achieved  a  lOOZ  recognition  rate.  Low  error  races  for  executive  conaands  such  as  these 
are  obviously  Important.  It  should  be  reaeabered  chat  the  coasunds  'enter',  'clear', 
'cancel',  'confirm'  were  all  tried  buC  none  were  so  successful  as  'ok'  and  'rubout'. 

The  aaln  goal  of  the  prograaas  has  been  to  establish  what  to  do  with  DVI  rather 
than  to  find  out  how  good  currant  recognlsers  are.  To  achieve  this,  the  vocabulary  wss 
tailored  to  Include  unique  words  which  were  unlikely  to  be  required  In  future  recog- 
nlsers.  Carefree  speech  cannot  be  handled  by  speech  recognlsers  yet.  As  a  result, 
strange  words  have  appeared  In  the  COU  prograaae  such  as  plot  (fix),  fife  (five),  nlner 
(nine),  ok  (enter),  rubout  (eraee/clear).  A  aajor  benefit  of  better  recognlsers  would 
be  Che  ability  to  choose  a  vocabulary  to  suits  the  pilot  Instead  of  the  recognlser. 


10 


Bacausc  avlacloa  vole*  procedure  le  ucll  eetebllshed,  changing  the  associated 
Tocabular;  to  suit  speech  recogntsers  would  Introduce  Its  own  problems.  Therefore  the 
trials  vocabulary  la  seen  only  as  a  temporary  measure. 

(c)  Digit  recognition 

The  orarall  recognition  rate  for  the  digits  0-9  and  the  executive  oh  and  rubout 
commands  uea  9S.4Z. 

k  confusion  matrix  for  the  digits  Is  shown  in  Fig  7.  The  recognition  performance 
varied  between  99. for  the  digit  '3'  to  90Z  for  the  digit  '4'.  It  It  also  Interesting 
to  note  that  the  digits  *7'  and  '8'  were  never  confused  with  other  digits. 

When  quoting  recognition  error  rates  for  connected  or  continuous  word  recognlsers 
one  has  to  be  extremely  cautious.  Although  a  'word'  error  race  Is  of  great  Interest  and 
use  when  comparing  the  perfoteance  of  different  equipments,  care  has  Co  be  cahen  irtien 
using  the  results  as  a  measure  of  operational  effectiveness.  Consider  for  example,  a 
defined  %mrd  error  rata  of  SZ,  ^  a  1  in  20  recognition  failure  rate.  If  a  pilot  now 
uses  this  system  to  Input  a  series  of  10  digit  strings  for  navigation  coordinates  then  a 
SOZ  navigation  update  error  rate  will  result  which  la  not  so  attractive.  Of  course,  Che 
rate  with  which  a  pilot  can  Input  data  Is  also  Important  and  it  should  be  pointed  out 
that  s  typical  Input  error  rata  for  a  keyboard  Is  of  the  order  SZ. 

(d)  Word  usa^e 

Figs  8  and  9  show  the  D7I  word  or  phrase  usage  rate  for  the  three  pilots.  As  can 
be  seen  all  three  pilots  used  the  DVt  vocabulary  In  similar  relative  weys.  Data  entry 
was  the  most  used  feature  of  the  DFI.  In  addition,  selection  of  key  acsss  In  the  CDU 
structure  such  as  comas,  navigation,  guidance  or  menu  was  accomplished  many  tlaes  by 
voice.  Display  mode  selections,  which  could  be  done  st  any  time  by  one  key  depression 
were  also  asde  by  voice,  for  example,  map  up,  nap  down  and  tees  and  pees. 

Thera  are  clearly  defined  situations  when  speech  recognition  is  Che  preferred 
method  of  control  or  data  Insertion  and  others  when  equally  clearly  it  Is  not.  Between 
these  two  extremes  the  relative  advantages  of  voice  and  tactile  control  are  less  clear 
and  the  key  to  success  lies  In  the  Integration  of  the  two. 

In  a  noisy  radio  environment,  with  a  busy  navigation  task,  plus  perhaps  a  need  to 
sat  up  the  guidance  page  on  the  OXJ  for  an  approach,  DFI  Is  not  easy  to  uae.  This  Is 
particularly  true  If  there  Is  also  a  high  level  of  Intercom  chatter.  The  Initiation  and 
maintenance  of  any  conversation  absorbs  the  pilots  time  and  concentration.  The  busy 
pilot  msy  be  listening  to  two  radios  plus  the  Intercom,  and  also  talking  to  one  agency 
by  radio  and  other  aircrew  on  Intercom.  He  hardly  wants  to  open  s  rigidly  structured 
conversation  with  his  alrcrsft  too.  Nevertheless,  the  pilot  may  choose  to  use  voice  for 
parts  of  his  cockpit  asnagement,  even  when  tesy.  In  these  situations  the  recognition 
performance  must  be  vary  good. 


TO  FS(B)  637 


11 


A. 3  Pllotlag  aspects 

4.3.1  Speech  quality 

The  proauacleclon  of  oeay  vorde  In  normal  speech  Is  dramatically  affected  by  its 
context.  For  this  reason  when  the  system  was  trained,  It  mbs  necessary  for  the  pilot 
either  to  anticipate  the  application  context  or  else  say  the  vocabulary  In  a  neutral 
manner  end  without  emphasis.  Pilots  found  they  frequently  had  consciously  to  attempt  to 
recreate  the  word  as  trained.  The  use  of  speech  recognition  is  clearly  a  skill  which 
has  to  be  acquired. 

Pilots  vising  DVI  for  the  first  time  often  needed  three  or  four  flights,  with 
Interim  re'-tralnlng  periods,  before  the  recognition  performance  of  the  DVI  was  at  the 
high  levels  described  in  section  4.2.  This  was  a  function  pTlmarlly  of  the  time  needed 
to  acquire  t.ils  'DVI  skill*. 

4.3.2  Visual  cues 

As  errors  can  occur  In  %ford  recognlsers,  It  Is  necessary  to  feed  back  to  the  pilot 
the  results  of  the  utterance.  One  direct  method  used  In  the  Wessex  was  a  read  out  line 
on  the  CDU  monochrome  display  to  tell  the  pilot  what  the  ASR  had  recognised  see  Fig  2. 
Initially  this  line  was  Invariably  checked  but  less  so  once  voice  echo  was  Introduced. 

It  was  Intended  during  the  research  prograosse  that  the  DVI  programme  and  Its 
syntax  tree  should  closely  resemble  the  CDU  page  structures.  This  meant  that  the  CDU 
prompted  the  available  OVI  vocabulary  thus  eliminating  the  possibility  of  the  subject 
pilot  finding  himself  la  a  corner  of  the  programme  wondering  how  to  get  out. 

A  problem  which  was  noted  early  on  In  the  trials  wss  the  inability  of  the  ASR 
syntax  to  track  that  of  the  CDU  although  the  CDU  was  able  to  track  the  ASR  syntax.  Thus 
If  the  ASK  was  left  in  one  mode  whilst  the  CDU  moved  to  another  and  if  the  ASR  was  not 
then  moved  to  the  new  node,  chaos  could  ensue,  le  the  system  was  unfriendly. 

The  difficulty  has  now  largel'y  been  resolved  although  the  DVI  takes  s  second  or 
two  to  follow  the  CDU.  This  caused  difficulty  at  times  when  mixing  Input  channels 
quickly.  Essentially  la  a  system  to  be  Introduced  into  service  mutual  tracking  must  be 
fast  and  transparent  to  the  pilot. 

4.3.3  Voice  echo 

Although  the  reed  out  line  on  the  CDU  wss  very  effective  there  were  situations 
during  which  the  pilot  could  not  afford  the  time  to  seen  the  display*  The  use  of  the 
synthetic  voice  system  to  provide  e  voice  echo  of  the  recognised  word  or  phrase  was 
therefore  Investigated. 

Ixiltlslly,  the  digits  0-9  and  the  executive  commend  'ok'  were  echoed.  The  echo 
was  Initiated  as  soon  as  the  recognlser  output  Its  data  string  to  the  microprocessor 
which  hosted  the  synthetic  voice  card.  This  frequently  resulted  in  the  voice  echo  being 
Initiated  while  the  pilot  was  still  using  the  DVI,  This  often  upset  the  rhythm  of  the 
pilots  DVI  commands.  Nevertheless,  pilots  found  the  audio  conflnsetlon  useful  end  they 
found  themselves  scrutinising  the  read  out  line  and  the  CDU  data  entry  fields  less 
often. 


12 


Ttw  naxt  icage  uis  to  hold  off  the  voice  echo  until  the  D7I  ectlvete  twitch  wet 
celeated.  The  echo  was  heard  within  half  a  second  of  release  of  the  switch.  This  was  a 
auch  aore  satisfactory  asthod  of  iapleaentatlon  which  was  auch  preferred  by  pilots.  In 
addition,  the  Hat  of  %iords  which  were  echoed  wts  Increaaed  to  Include  the  keywords  such 
at  'coaas'  and  'navigation',  and  alao  toaa  of  the  aore  conaonly  used  words  such  as 
' plot ' .  Many  of  the  DTI  coaaands  related  to  display  node  changes  were  not  echoed  as 
this  would  unnecessarily  create  excessive  audio.  If  a  request  for  a  display  node  had 
been  aade  then  the  pilot  would  be  looking  at  the  display  and  thare  would  be  no  need  for 
audio  conflraatlon. 

Frequently  nls-recognltlon  patterns  were  donlnated  by  the  sane  word.  This  would 
be  no  problea  If  the  word  was  'Machlnhaalah  Talkdown'  but  If  It  was  'two'  then  the 
situation  would  be  aore  serious.  Increasing  the  nuaber  of  syllables  clearly  laproved 
recognition  pcrforaance  but  a  significant  nuaber  of  utterances,  such  as  the  digits,  were 
constrained  to  renaln  aonosy liable . 

Digits  caused  najor  probleas  Initially  but  with  pilot  experience  and  an  laprove- 
aent  to  the  S&-I28,  this  difficulty  was  largely  overcoae.  The  ASR  aay  well  recognise 
9S  out  of  100  different  utterances.  When  Inputting  large  strings  of  digits  or  coaaands 
Che  Individual  word  error  rate  beeoaas  lass  laporcant.  The  laportant  factor  la  the 
probability  of  Inputting  a  coaplete  coaaunlcaclon  sequence.  Word  error  races  of  SZ  nay 
chan  becoae  conaunlcatlon  sequence  error  rates  of  50Z  which  la  not  satisfactory. 

4.3.4  Pses 

The  aajor  benefits  of  DVI  were  not  laaedlately  obvious  since  they  generally 
occurred  as  a  result  of  laccgratloa. 

Reading  In  strings  of  digits  such  as  10  figure  groups  for  waypoints  was  particu¬ 
larly  useful  and  laich  quicker  Chan  the  asnual  equivalent.  When  Inputting  digit  strings 
by  hand  froa  a  list  Che  pilot  has  to  read  the  Hat,  look  at  Che  keyboard  whilst  keying, 
and  Chen  Che  CDO  to  conflra  accurate  entry.  In  addition  people  frequently  spoke  the 
digit  string  whilst  keying.  However,  Che  action  can  be  reduced  to  reading  the  digit 
string  out  loud  and  checking  recognition  on  the  ODD  screen  or  listening  to  Che  voice 
echo.  This  aspect  of  Integrated  DTI  and  SSO  vas  found  particularly  useful.  Another 
frequent  use  of  DVI  was  to  change  the  HFD  foraaCs  particularly  In  hovering  or  NOE 
flight,  with  the  pilot  flying  with  hands  on  controls  at  all  Claes  and  eyes  predoalnancly 
outside  Che  cockpit. 

AC  other  tlaes  coablnaclons  of  hand  and  voice  were  used  to  Input  data. 

During  periods  of  Incense  concentration  such  as  In  unscablllsed  Instruaenc  flight 
soaa  pilots  tended  to  use  their  hand  to  select  the  display  eleaenc  required  and  then 
perfora  Che  data  input  and  execution  coaaand  by  voice,  thus  the  keyboard  search  cask  was 
ellalnated. 

In  busy  audio  cnvlronaancs  pilots  tended  Co  favour  aanual  Input  to  speech  Input. 
This  was  particularly  true  of  slapla  selections  where  the  pilot  could  act  Instinctively. 


TM  FS(B)  637 


13 


Dalng  tha  DVI  %ihau  Uatanlog  to  a  tadlo  aaaaage  uta  difficult.  Tha  pilot  had  to  eoncan- 
trata  bafora  baglnolag  tha  convaraatlon  with  the  cockpit  and  thla  could  aaally  load  to 
hl>  alaalng  a  tadlo  or  Intarcoa  aaaaage. 

An  Intaraatlng  problaa  waa  ancountarad  whan  ona  pilot  retumad  froa  a  Conan  vlalt 
vlth  a  aolca  taaplata  ahlft.  Tha  pilot  had  bacoae  accuatoaad  to  ualng  different  apoech 
Inflaxlona  to  ba  uadaratood  bjr  Geraan  Air  Traffic  Control.  A  apaach  taaplata  trained 
laaadlatalp  after  ha  raturnad  aaa  gulta  uaalaaa  althln  two  daya  aa  hla  apaacb  pattern 
raturnad  to  ooraal.  tatralnlng  tha  taaplata  cured  the  lacognltloo  difficulty.  Thla  waa 
an  Intaraatlng  phanoaanon  which  could  have  raalflcatlona  when  flying  froa  ona  country  to 
another  where  tha  pilot  oeada  to  apply  aa  Inflection  to  hla  mice  la  order  to  be 
undaratood  by  air  traffic. 

A. 4  Integration  aapacta 

Froa  thaaa  trlala  It  la  claar  that  OVI  will  often  ba  uaad  aa  the  priaary  aethod  of 
control  of  ayataaa  and  of  data  entry.  Howavar,  bacauaa  of  tha  operational  problaaa 
which  can  aoaatlaaa  arlae  it  la  uaaultable  aa  tha  aola  aeana  of  controlling  tha  aircraft 
ayataaa.  Voice  racognltlon  ahould  therefore  ba  regarded  aa  aa  additional  control 
aachanlaa  which  can  be  uaad  aa  aa  altamatlva  to  tactile  aethoda  of  control  auch  aa 
touch  ovarlaya  or  kayboarda.  Thla  will  prowlda  a  uaaful  lawal  of  redundancy  ahould  one 
of  tha  tactile  Input  ayataaa  fall,  but  aora  laportantly  apaach  provldaa  a  valuable 
altamatlva  whenever  tactile  control  la  Inconvenient  for  tha  pilot  to  uaa. 

A  DVI  ayatea,  if  properly  integrated  with  othar  cockpit  ayataaa  haa  bean  ahown  to 
ba  cKtraaaly  affective.  Without  careful  latagratlon  tha  DVI  ayatea  will  prove 
'unfrlandly'  and  In  tha  lladt  unuaabla  by  pilota.  The  trlala  have  eatabllahad  gulda- 
llnea  for  integration  In  three  arena 

(a)  DVI  enable 

The  OVI  enable  aachanlaa  which  waa  uaad  in  Che  Weaaez,  celled  on  the  pilot  ualng  a 
flight  control  aountod  awlteh.  Thla  awitch  relayed  tha  pilot 'a  aicrophone  to  the  DVI. 
Tha  pilota  found  thla  poaltloo  aaay  to  uae  and  they  could  fly  'handa-on'  throughout. 
Subaequent  DVI  aeaaagea  ware  llaltad  only  by  tha  ayntaa  needed  to  aaintain  adequate 
recognition  perforaance  and  aenalble  correlation  with  Che  CDD  progcaaaa.  Tha  technique 
adopted  waa  Identical  to  Chat  when  ualng  the  radio  'praaa  to  cranaalt  button',  and  aa 
auch  pilota  found  It  a  natural  control. 

A  aathod  of  enabling  the  DVI  which  waa  conalderad  but  not  adopted  waa  that  of 
ualng  dlaclnctlva  keyworda  to  engage  and  dlaengage  Che  DVI,  over  a  'hot  aicrophone'. 

Pilota  diallkad  tha  keyword  phlloeophy  for  a  nuaber  of  rcaaooa 

(i)  The  keyworda  would  have  to  be  carefully  choaen  ao  that  they  were  dlatloctiva  and 
rare  In  noraal  eonveraaclon.  Although  accldancal  engageaant  alghc  ba  rare,  oulaance  DVI 
activation  would  ba  vary  annoying. 

(ii)  Keyworda  would  Incraaae  the  cranaactlon  cine,  particularly  for  alapla  one  word 
coaaanda . 


14 


. 


(ill)  The  pilot  Hould  have  co  naeabor  to  use  the  dc-actlvate  keyword  at  Che  end  of 
every  coaauiiicatloa  with  Che  recognlaer. 

(iv)  Flloca  dlillked  any  fora  of  arclflclal  llalcatlon  co  Che  use  of  DVI.  In  particu¬ 
lar,  rigid  syntax  was  crlclclaed.  Keywords  represent  a  fora  of  rigid  syntax  which  when 
coablned  with  other  syntax  within  the  recognlaer  prograne  would  require  an  unnatural 
fom  of  pllot-DVI  eoaaunl cation. 

<b)  Tracking  of  I/O  syateaa 

With  a  aulclcuda  of  aechoda  of  controlling  syateaa  and  Inputting  data,  It  was 
found  to  be  laperaclve  that  each  control  systaa  could  crack  the  operations  of  others  In 
order  chat  any  type  of  control  could  be  used  by  the  pilot  at  any  clae.  For  exaaple,  the 
syntax  used  within  the  DVI  ausc  track  the  CDD  aenu  structure  and  vice  versa.  If  this 
was  not  dona  pilot  workload  could  increase  significantly  when  an  operation  started  with 
one  Input  channel  could  not  be  coapleted  by  another. 

(c)  Feedback  of  the  word  recognition  process 

When  using  DVI,  it  was  iiqiortanc  for  the  pilot  co  know  what  word  or  phrase  the 
ASR  actually  racognlsad.  Often  this  was  obvious,  for  exaaple  the  correct  aulcl-f unction 
display  foraac,  such  as  the  asp,  would  appear  when  requested.  However,  on  those 
occasions  when  the  syscea  did  not  react  or  als-recognlsed  a  coanand,  the  pilot  found 
chat  feedback  of  DVI  recognition  aaa  aasantial.  Two  aethods  of  providing  this  feedback 
were  Investigated. 

The  first  was  a  visual  read-out  of  the  previous  and  current  outputs  froa  the  ASR, 
which  was  displayed  as  a  feedback  Una  on  the  aonochroae  display  of  the  CDO  (see  Fig  2). 
This  was  vary  successful  enabling  a  pilot  quickly  to  identify  and  correct  errors  caused 
by  als-recognlclon  or  rejections  say  It  again) • 

The  second  was  a  voice  echo  using  the  synthetic  speech  syscea.  The  aaln  benefits 
of  voice  echo  were  noted  during  the  Insertion  of  long  strings  of  figures  such  as  way- 
points  and  frequencies  In  low  level  hovering  and  NOE  flight.  Pilots  found  their  ability 
to  lookout  was  auch  laproved  by  voice  echo  since  there  was  no  need  to  look  In  and  focus 
on  Che  CDO  to  check  data  Input. 

The  voice  echo  was  also  useful  whan  errors  occurred.  Pilots  quickly  picked  up 
wrong  or  absent  responses,  such  as  a  wrong  ouaber  or  a  recognition  error  which  took  the 
prograaae  to  the  wrong  node  of  tha  syntax. 

5  SPEECH  OOTPUT  RESULTS 

The  synthetic  speech  systea  was  used  during  aost  flights  so  that  Its  effectiveness 
could  be  evaluated  In  a  variety  of  flight  reglaas.  Three  types  of  output  have  been 
Investigated  corresponding  to  priority  Inforasclon,  optional  Infomaclon,  and  an  audio 
feedback  of  Che  DVI  recognition  process  as  the  first  stage  of  a  voice  Interactive  syscea 
study. 


H 


IM  tS(B)  bil 


TM  FS(B)  637 


15 


5.1  Sp««eh  Inttlllglblllty 

The  vocabularj  caqulrad  for  tha  trials  could  ba  produced  using  a  portable  speech 
processor  system.  Prior  to  the  flight  trials  savaral  people,  sale  and  female,  from  many 
occupational  areas,  were  invited  to  program  a  sample  vocabulary  which  was  subsequently 
tested  In  the  laboratory.  This  was  not  a  sophisticated  teat  but  simply  consisted  of  a 
few  people  comparing  the  resultant  quality  of  speech.  On  this  basis  the  voice  of  a 
femala  secretary  was  Judged  to  be  the  most  Intelligible  and  distinctive. 

During  flight,  the  Intelligibility  was  not  as  good  as  In  the  laboratory  but  was 
navertheleaa  satisfactory.  Tha  distinctive  female  voice  could  always  be  distinguished 
from  other  aural  sources  such  as  hallcoptcr  crew  or  air  traffic  controllers  etc.  In 
particular  the  word  'warning*  which  prefaced  tome  of  the  messages  was  an  excellent 
attention-getter  which  never  failed  to  focus  the  pilots  thoughts  Immediately  to  the 
problem  that  had  emerged. 

Tha  elements  of  the  synthetic  voice  used  for  echo  purposes  were  not  so  intrusive 
that  they  blotted  out  incoming  radio  traffic  or  essential  intercom  chatter.  Under  such 
circumstances,  pilots  quickly  became  accustomed  to  checking  the  CDD  and  filtering  out 
tha  synthetic  speech. 

The  SSO  in  echo  mode  operated  In  a  somewhat  uneven  staccato  manner  which  was  due 
to  the  speech  processing  system.  To  Improve  this  the  synthetic  voice  will  need  to  be 
clearer,  faster  and  delivery  will  have  to  be  better  maaaured  with  similar  gaps  between 
words.  The  system  used  could  have  been  Improved  but  considerable  time  and  effort  would 
have  been  required  to  edit  tha  stored  vocabulary.  In  the  future  a  time  synthcsls-by- 
rule  system,  as  opposed  to  one  based  on  recorded  speech,  may  be  the  best  way  of  achieving 
acceptable  prosody. 

5.2  Speech  output  control 

When  the  aircraft  power  supplies  were  activated  prior  to  engine  start  the  voice 
card  was  programmed  to  output  a  time  dependent  recognition  phrase  such  as  'good  aornlng'. 
In  this  way  the  pilot  was  assured  that  the  voles  facility  was  operational. 

Certain  words  or  phrases,  listed  In  Table  6  remained  active  at  all  times.  The 
majority  of  these  related  to  failures  or  limit  exceedences,  which  occurred  Infrequently, 
unless  deliberately  Induced.  The  appropriate  single  warning  was  given,  such  as  'check 
fuel'.  This  was  not  capasted  unless  tha  fault  disappeared,  or  was  rectified,  and  then 
developed  again.  The  torque  margin  output  was  repeated  every  7  s  If  the  Halt  continued 
to  be  exceeded. 

The  second  group,  listed  In  Table  5,  corresponded  to  optional  Information  which 
was  all  related  to  height  and  height  clearances.  All  outputs  ware  repeated  every  7  s  If 
the  cause  of  the  Initial  call  still  reaalnsd.  To  avoid  nuisance  warnings  which  could 
occur  during  some  modes  of  flight,  this  set  of  calls  could  be  disabled  by  the  pilot 
using  a  control  on  the  CDU  display.  Whan  disabled  In  this  way  the  voice  aysten 
announced  It  had  bean  disabled  by  saying  'goodbye'. 


Ik 


16 


4 


Th«  group  of  calls  could  ba  anablad  by  lalacclag  cha  saaa  control  on  Che  CDU,  this 
proaptad  a  'hello*  to  conflra  Cha  ayacaa  uaa  again  operational. 

Thli  ra-acetvaclon  procaaa  alao  took  place  via  the  control  prograa.  Vlhen  ILS  or 
MLS  uere  aaleccad  the  SSO  optional  vocabulary  hbb  activated  as  the  aircraft  passed 
through  500  ft  (or  rad  alt)  in  cha  daacent.  Thus  the  SSO  said  'hallo'  at  500  ft  and 
chan  contlnuad  down  to  daclalon  height  In  50  ft  steps. 

It  la  worthy  of  note  that  avan  though  pilots  tiace  working  hard  on  soaa  approaches 
they  looked  forward  to  cha  'hello'  call  followed  by  the  height  count  down  despite  the 
natural  tendency  to  filter  out  aacranaoua  audio  Infonaacloo.  This  wade  these  calls 
particularly  beneficial.  The  height  calls  snd  pull  up  coanand  were  also  used  during  low 
level  night  operations.  When  hovering  or  landing  the  calls  were  usually  Inhibited  to 
avoid  nuisance.  To  ensure  that  Che  ayacem  was  ra-acclvated  If  the  pilot  had  forgotten 
to  do  so  himself,  the  system  was  automatically  reselected  %ihen  the  groundspeed  Increased 
through  30  fcn. 

5.3  Operational  effectiveness 

5.3.1  Priority  Information 

Although  under  normal  circumstances  most  of  the  warnings  would  occur  Infrequently 
all  could  be  artificially  Induced  by  various  means.  Whenever  a  problem  was  discovered 
several  visual  warnings  In  addition  to  a  voice  call  would  be  given  to  the  pilot.  For 
the  asst  critical  problems,  such  ss  sn  engine  failure,  the  sultl  function  display  format 
was  automatically  presencad  with  a  forMt  which  showed  the  problem  together  with  a  small 
list  of  checks.  In  addition  the  CDU  display  was  cleared  except  for  a  short  msssage 
which  repeated  the  voice  call.  The  next  level  of  warning,  such  ss  a  fuel  concents 
mismatch  between  tanks,  produced  Che  voice  call  and  a  message  on  the  CDU.  The  final 
level  produced  only  the  voice  call.  It  should  be  noted  that  all  the  warnings,  whether  a 
display  format  change  occurred  or  not,  produced  colour  changes  on  at  least  one  of  the 
formats.  For  example.  If  the  torque  limit  was  exceeded  the  torque  strip  on  the  FFD 
would  change  colour  from  white  to  magenta.  In  this  way  any  speech  output  wmlng  was 
reinforced  by  a  change  In  format  and/or  symbology  colour  change.  This  combination  was 
very  effective  and  attracted  the  pilots  attention  to  Che  salient  feature  of  Che  display. 

If  the  pilot  was  head  out  on  a  osp  of  Che  earth  exercise,  a  simulated  cable 
warning  could  be  programmed  to  appear  during  Che  sortie.  On  all  occasions  the  speech 
output  attracted  the  pilots  attention,  and  on  looking  at  Che  displays  the  problem  was 
quickly  dlagnoaed  confirmed  and  dealt  with.  The  torque  limit  wnmlng  often  occurred 
when  In  low  forward  speed  or  hovering  flight.  On  hearing  Che  warning  'torque'  pilots 
would  often  lower  the  lever.  If  It  was  safe  to  do  so,  wl.hout  looking  lamedlataly  Into 
Che  cockpit,  and  thus  concentration  on  the  visual  scene  could  be  maintained. 

5.3.2  Optional  Information 

A  list  of  the  optional  vocabulary  Is  shown  In  Table  5.  These  words  were  usually 
enabled  by  Che  pilot  (or  the  main  processor)  during  Instrument  approaches  or  transits. 
The  Mjorlty  of  the  vocsbulary  wee  used  In  the  generation  of  aircraft  height  calls 


H 


ill  t>j/ 


TH  FS(B)  637 


17 


during  an  Instruaane  approach.  Whan  tha  daclalon  halghc  waa  reached  a  'dcclalon  height' 

call  cogcchar  with  a  DBT  flag  on  Cha  FFD  uould  be  activated. 

A  prograaaable  lo«  height  warning  could  be  selected  by  tne  pilot.  If  Che  air¬ 
craft  descended  below  this  height  a  'pull  up  pull  up'  eesaage  waa  generated.  This  waa 
used  during  any  flight  condition  whan  nalntalnlng  a  sac  height  clearance  was  essential. 

This  was  particularly  useful  during  any  operation  which  dewanded  alnlwua  terrain 
separation.  The  feature  was  specifically  Introduced  for  low  level  NVG  operations. 

Halghc  asaeaaaenc  during  these  operations  was  far  frow  being  easy  or  accurate  due  to 

the  Halted  field  of  view  of  the  MVG's  and  Che  aonochroae,  artificial  nature  of  Che 
laage  seen  by  tha  pilot.  In  addition  to  the  pull  up  coeaand  Che  height  calls  at  low 
level  gave  useful  trend  Inforaaclon.  For  exaaple,  when  flying  at  100  ft  above  ground 
level  Che  pull  up  coaaand  could  be  set  to  70  ft.  Height  calls  at  100,90  and  80  ft  would 
be  heard  before  Che  final  'pull  up  pull  up'  coaaand  was  given. 

When  Che  height  director  aode  was  salectad  on  tha  FFD  two  further  calls  corre¬ 
sponding  CO  'high'  and  'low'  ware  enabled.  These  were  used  to  define  height  boundaries 
about  Che  halghc  profile  Chet  was  flown  on  Che  height  director.  They  were  reinforced 
with  flags  on  cha  FFD. 

Fllocs  liked  the  advisory  Infomatlon  and  warnings  related  to  height.  Although 
they  did  not  reduce  workload  pilots  felt  aors  confident  when  they  were  present.  There 
Is  little  doubt  Chat  such  a  systaa  would  have  a  treaendous  lapsct  on  the  safety  of  both 
civil  and  alllcary  operations.  Tha  ability  to  disable  the  calls  at  the  pilots  discretion 
was  considered  vital  to  avoid  unnecassary  chatter  at  the  end  of  an  approach  or  during 
general  visual  flight.  Without  this  facility  the  synthetic  speech  would  be  a  nuisance. 
Fllocs  would  switch  It  off  and  tha  benefit  of  safety  calls  would  be  lost  too.  Systen 
logic  to  ensure  that  the  speech  output  was  enabled  prior  to  an  approach  has  proven  to  be 
effective. 

5.4  Voice  Interaction 

This  has  already  been  discussed  to  tone  extent  In  section  4.3.3  which  described 
Che  use  of  synthetic  speech  as  a  feedback  to  the  pilot  of  DVI  recognition.  The  success 
of  a  full  voice  intaractiva  systen  will  depend  heavily  on  the  operating  environaenc  in 
which  it  is  asked  to  work.  During  Che  Wessex  trials  two  comants  were  often  made. 

(a)  There  wore  clows  when  DVI  operation  was  difficult  due  to  ocher  audio  Casks. 

(b)  The  speech  output  should  be  kept  to  an  absolute  nlnleun  to  avoid  It  nasklng  ocher 
radio  and  crew  aassaget  and  baconlng  a  distracting  nulMnce. 

Having  pointed  out  these  Ueltatlons  there  nay  still,  be  eany  occasions  when 
questlon/answer  sessions  between  a  pilot  and  cha  avionics  syscees  could  be  affective  and 
worthwhile.  Such  a  systen  would  require  coaplax  conpuclng  but,  %rlch  current  technology. 
Is  a  raallaclc  proposition.  The  eccsptablllcy  sad  effecClvsnsas  of  soliciting  syntheti¬ 
cally  spoken  Inforaaclon  by  voles  will  be  addressed  In  future  trials. 


A 


H 


18 


6  COWCLUSIONS 

Iha  Uisaax  flight  trials  hava  auccessfully  deBOnstrated  tha  capabilities,  benefits, 
and  problaaa  of  using  speech  recognition  and  synthesis  systeiss  In  conjunction  lelth  other 
advanced  cockpit  facilities.  Tha  technologies  Integrated  with  the  speech  systeas 
Included  colour  CRT  displays,  digital  aaps,  touch  overlays,  joysticks,  processing  and 
Interfacing  to  other  airborne  sensors  and  aqulpaent  such  as  anginas,  transalsslon, 
radios,  guidance,  navigation,  and  alrdata  systeas. 

(a)  Speech  recognition 

(1)  The  parforaance  of  a  speech  recognition  systca  has  been  found  to  be  very  dependent 
upon  vocabulary  content,  syntax  governing  tha  nuaber  of  active  irords,  ambient  noise  and 
speaker  cheractcrlstlca. 

(2)  During  trials  In  which  pilots  were  briefed  to  use  the  DVI  only  when  they  felt  It 
would  be  aora  useful  than  other  control  acchanlsas,  a  aean  word  error  cate  of  6Z  was 
recorded.  On  aoae  flights  lOOZ  success  was  achieved. 

(3)  Pilots  found  tha  speech  recognition  systea  extreaely  useful  for  eany  different 
tasks  such  as  navigation  updates,  radio  aanageaent  and  display  node  selections.  It  was 
particularly  useful  when  visual  hand/eye  co-ordination  tasks  were  absorbing  considerable 
pilot  effort,  such  as  during  nap  of  tha  earth  flight. 

(4)  Operational  conditions  have  been  encountered  when  the  OVI  systea  sas  difficult  to 
use  and  other  aethods  of  control,  such  as  the  touch  overlay,  were  favoured.  These  con¬ 
ditions  ware  associated  with  a  busy  audio  envlronasnt,  for  cxaaple  during  the  aarshalllng 
phase  of  an  approach.  Under  such  clrcuastances  DVI  was  difficult  to  use  because  of  the 
random  nature  of  Incoming  Inforaatlon  and  a  need  for  the  pilot  to  gather  his  thoughts 
before  speaking. 

(3)  DVI  systeas  need  to  be  properly  Integrated  within  the  core  avionics  systea  If  the 
potential  benefits  of  speech  recognition  are  to  be  fully  realised.  In  particular  the 
following  are  required 

-  a  aathod  by  which  the  pilot  can  activate  the  DVI  using  a  cyclic  or  collective 
aountad  switch,  thus  enabling  hands  on  control. 

-  e  core  avionics  systea  capable  of  controlling  all  Input  and  (xitput  devices  to 
enable  the  pilot  to  coaaence  an  Input  with  one  systea  and  coaplete  it  with 
another. 

-  a  feedback,  both  visual  and  aural  of  all  or  per',  of  tha  recognition  process  to 
the  pilot. 

(6)  Speech  recognition  s 'steas  need  to  use  ACC  on  the  pilots  voice  Input  alcrophone 
level  or  have  recognition  elgorlthas  which  ere  tolerant  of  the  varying  signal  level  of 
speech  that  occur  naturally  la  everyday  conversation  end  In  the  cockpit. 

(7)  Active  oolca  coapensetion  techniques  are  required  to  avoid  the  requlreaenc  to  load 
'noise  aasks'  Into  the  recognlsar,  a  process  which  Is  both  tedious  and  cine  consuaing. 


/fQ  HI 


TM  FS(B)  637 


19 


(8)  To  b*  opcratloiully  acceptable  future  recognition  systeme  nuac  be  tolerant  of  the 
variable  nature  of  pilots  speech  during  all  phases  of  flight  as  well  as  during  stressful 
conditions- 

(9)  Bandoa  word  recognition  success  rates  In  excess  of  99Z  are  required  If  DVI  Is  to 
becoae  a  realistic  tool  In  the  helicopter  cockpit  of  the  future.  This  will  provide  an 
average  coaaunlcatlon  sequence  error  rate  of  SZ  idilch  will  be  acceptable.  Although 
sytteu  are  Incapable  of  aeatlng  this  target  at  present,  future  systeos  show  every  Indi¬ 
cation -of  having  sufficient  perforaance. 

(10)  During  the  course  of  a  helicopter  alsslon  the  loading  placed  on  the  sensing  and 
control  ■echanleaa  of  a  pilot  vary  considerably.  At  tlMS,  for  exaaple  during  low  level 
flight,  his  hands  will  he  fully  eoaaltted  to  the  flying  controls  and  his  eyes  will  be 
concentrating  sailnly  on  outside  features.  Under  these  clrcuastances  DVI  can  be 
extreaely  beneficial.  At  other  tlaee,  for  exaaple  during  coaplex  cockpit  crew  pro¬ 
cedures,  In  busy  radio  envlronaents,  or  In  high  stress  situations,  the  aural  channels 
can  be  saturated  or  degraded.  Then  touch  overlays  and  hands-on  controls  can  be  superior. 
Between  these  two  extreaes  a  blend  of  Input  aethods  Is  required. 

For  these  reasons  It  Is  not  envisaged  that  DVI  will  replace  any  control  aechanlsa. 
It  offers  an  eleaent  of  redundancy  and  If  correctly  Integrated  Into  the  cockpit  It  will 
be  capable  of  lapraving  the  asn  aschine  Interface  to  a  far  greater  degree  chan  hand  or 
voice  alone. 

(11)  The  training  of  voice  CeaplsCes  for  Che  DVI  systea  needs  to  be  considered  more 
fully  in  operational  situations.  Although  clae  can  be  expended  In  a  research  envlron- 
aent  Co  train  the  OVl  carefully  and  then  edit  any  anoaolles  In  flight,  this  would  not  be 
cost-effective  or  realistic  In  operational  use.  Pilots  will  require  a  one  pass  guaran¬ 
teed  training  session  which  should  be  possible  to  accoapllsh  off  Che  aircraft. 

(12)  The  asthod  of  loading  pre-recorded  ceaplatcs  Into  Che  DVI  systea  needs  to 
considered.  This  should  be  a  siople,  quick  process,  which  the  pilot  can  initiate  from 
Che  cockpit. 

(13)  Future  DVI  sysceas  should  allow  a  multi-crew  operation.  This  does  not  require 
speaker  Independent  sysceas  but  slaply  the  capability  of  snoring  and  using  aore  than  one 
set  of  voice  ceaplaces. 

(14)  Limited  syntax  laproved  ASK  porforaancs  but  excessive  use  of  syntax  degraded  user- 
frlendllness  of  Che  total  systea.  Future  eysteas  should  seek  to  reaove  rigid  syntax  by 
Che  use  of  further  poet-recognltlon  Intelligence  and  dynsalc  control  of  tone  of  Che  key 
recognition  paraaecers  such  as  distance  thresholds  and  weighting  factors. 

(b)  Speech  output 

(1)  A  speech  output  systea  bssed  on  linear  predictive  coding  techniques  and  using  a 
fesMle  voice  had  bean  proven  to  be  Intelligible  and  distinctive  In  Che  helicopter  cock¬ 
pit  envlronaenc. 


20 


(2)  The  uee  of  a  'warning'  keyword  before  critical  jtssages  was  found  to  be  a  very 
useful  aCtentlon''getter ,  which  pilots  reacted  to  very  quickly  conpared  to  other  less- 
critical  alerts. 

(3)  Two  levels  of  speech  output  have  been  found  to  be  requ’.red.  The  first  corre¬ 
sponds  to  high  priority  information  which  occurs  infrequently-  These  should  not  be 
••^^•ctable  and  for  the  majority  of  cases  they  should  produce  a  single  voice  warning. 

The  second  level  corresponds  to  height  related  calls  and  warnings.  These  can  occur 
frequently,  aometlmes  during  operations  for  which  they  were  not  Intended.  These  should 
be  pilot  selectable  to  avoid  becoming  a  nuisance. 

(4)  The  optional  height  call  and  wamlng  system  would  Improve  the  safety  of  both 
military  and  civil  operations.  Pilots  fait  more  confidant  when  the  speech  system  was 
enabled  during  for  example,  an  instrument  recovery  or  during  low  level  flight  at  night* 

(5)  The  use  of  speech  output  to  provide  Indications  of  failures  allowed  pilots  to  stay 
head  out  without  constantly  needing  to  aonitor  displays.  The  use  of  voice,  display 
format  changas,  and  colour  symbology  changes  to  highlight  problems  to  the  pilot  was 
found  to  be  a  vary  effactiva  combination. 

(6)  Speech  output  must  be  used  sparingly  to  avoid  it  becoming  a  nuisance.  Many 
warnings  would  ba  given  only  once.  Height  calls  or  warnings,  If  the  error  condition 
persisted,  were  only  repested  every  7  s. 

(7)  The  use  of  speech  Aitput  to  act  as  a  feedback  of  the  DVI  recognition  process  has 
been  demonstrated. 

This  WBS  not  distracting  %dien  correctly  used  and  reduced  the  need  to  monitor  the 
CDU  during  data  Insertion.  Better  look  out  resulted.  In  busy  audio  environments  the 
pilot  filters  the  relevsoc  information  and  thus  acts  upon  the  most  Important  data-  For 
example,  a  height  call  will  be  Ignored  when  an  Incoming  radio  message  is  present, 
whereas  if  a  pull  up  command  Is  hesrd  tha  radio  will  be  Ignored. 

(8)  Consideration  should  ba  glvao  to  trying  the  volume  for  different  warnings  such 
chat  flight  critical  msssagas  cannot  be  missed  or  mis-heard. 

Tha  Uesaex  trials  have  clearly  Indlcatad  that  speech  recognition  and  output  can 
give  significant  benaflts  to  oparaClonal  affectiveness  and  improve  safety.  However,  to 
do  this  the  systems  need  to  be  carefully  Integrated  with  the  many  other  controls  and 
fscllitles  which  mske  up  s  total  display  and  flight  management  system. 


w 


FS(B) 


TM  FS(B)  637 


21 


Appeodlx  A 
THB  »£SSPX  FACILITY 

A.l  The  coclcplt 

Fig  1  Is  a  photograph  of  the  Ueasez  cocltplt.  The  left  hand  aide  (LHS)  of  the 
Instrumeat  panel  uaa  the  subject  pilot's  position  and  the  controls  and  displays  avail¬ 
able  to  him  were  as  follows: 

Two  colour  displays  (one  with  a  couch  overlay) 

A  control  and  display  unit  (CDU)  with  touch  sensitive 

Overlay 

16  key  keyboard 

ASR 

SSO 

Cursor  controls  and  ASR  activate  switch  on  cyclic 

Hap  joystick  on  collective 

The  right  hand  aide  (RHS)  inacrument  panel  was  used  by  the  observer  or  safety 
pilot.  Either  of  Che  two  current  display  formats  could  be  selected  and  shown  on  Che  RHS 
Cube.  A  full  set  of  standard  electromechanical  Instruments  were  retained  for  use  by  the 
safety  pilot,  for  critical  sorties  such  as  evaluations  of  Instrximent  flight  In  IMC.  All 
the  aircrew  could  hear  Che  SSO. 

A. 2  Colour  displays  and  formats 

The  colour  displays  were  standard  commercial  monitors  which  were  fed  with  video 
(domestic  standard  FAL)  signals.  A  video  switching  and  distribution  system  enabled  any 
colour  CRT  Co  display  either  of  Che  two  formats  which  were  available  at  any  one  time. 

One  of  these  formats  which  was  always  displayed,  uts  the  primary  flight  display  (PFO). 
This  format  displayed  sufficient  Information  to  enable  Che  pilot  to  control,  navigate, 
approach  and  land  Che  helicopter,  in  visual  and  instrument  flight.  Using,  dedicated 
keys  or  the  control  and  display  unit  (CUU)  Che  subject  pilot  could  choose  to  display  one 
of  a  number  of  formats  on  the  second  multifunction  display  (HFD)  CRT.  In  addition,  with 
a  map  format  selected  the  GEC  Avionics  tactile  overlay  could  be  used  to  designate  a  way¬ 
point  . 

A. 3  Control  and  display  unit 

The  control  and  display  unit  consisted  of  a  monochrome  CRT  with  a  couch  sensitive 
overlay,  dedicated  keys  to  Che  left  and  right  of  Che  CRT,  and  a  numeric  keyboard. 

The  CDU  display  format  was  'menu  driven'.  By  couching  the  appropriate  box  on  Che 
display  the  pilot  could  work  his  way  through  the  menu.  Careful  integration  of  the  CDU 
menu  with  dedicated  keys  meant  chat  no  more  than  two  selections  were  required  before  any 
data  could  be  inserted  or  altered. 

The  basic  MENU  page  is  shown  in  Fig  2.  Selection  of  each  option  would  provide  a 
new  sec  of  choices.  For  example,  selection  of  CHECKS/LIMITS  would  then  provide  Che 
pilot  with  a  choice  of  checks  and  limits.  Some  functions  produced  a  further  level  of 


22 


Appendix  A 


■elecctona.  For  example  If  'navlgaclon*  mere  selected  via  the  dedicated  key,  followed 
by  'update'  on  the  touch  sensitive  overlay,  a  page  would  appear  asking  for  the  waypoint 
number  and  Its  grid.  Further  selection  of  the  'waypoint'  dedicated  key  would  put  the 
waypoint  listing  on  the  MFD  so  that  as  waypoints  were  updated  they  could  be  checked 
simultaneously  for  correct  entry. 

Although  moat  of  Che  CDU  menu  structure  could  be  accessed  quickly,  the  dedicated 
keys  made  It  more  responsive.  These  keys  short  circuited  Che  menu  structure  and  allowed 
the  pilot  select  directly  %ilthouC  starting  from  Che  menu  page. 

A. 4  Avionics  and  Interfacing 

The  flight  management  processors  were  Interfaced  Co  each  ocher,  to  Che  raveform 
generators  and  the  following  systems  and  sensors: 

Doppler  ground  speed  sensor 
Airspeed 

Heading,  pitch,  roll  attitudes 
Barometric  pressure 
Outside  air  temperature 
Radio  altimeter 

Normal  and  lateral  accelerometers 
Collective  and  cyclic  control  positions 
Guidance  systems  —  0.5,  HLS  and  MADGE. 


Engine  and  transmission: 

Power  turbine  speeds 
Compressor  speeds 

Power  Turbine  Inlet  Temperatures  (FTIT) 

Fuel  flows 
Fuel  contents 
Rotor  rcv/mln 
Torque . 

UHP/VHF  PTR  17SI  radio  (digitally  controllable) 

Digital  map 

Automatic  speech  recognises  (ASR) 

Synthetic  speech  output  (SSO) 

Non-volatile  msaury 
Radio  clock. 

In  addition,  audio,  video  and  digital  recording  systems  were  used  to  record  the 
display  formats,  pilot  Intercom,  radio,  and  the  tensor  information. 


IH  KS(B)  6J7 


TH  FS(B)  637 


App«ndlx  B 

SPEECH  RECOCMITIOW  AMD  ITS  MmMENTATlOH  IN  THE  WgSSBX 


23 


In  recent  ycara  onny  coMnrclal  ASE  lyiten*  have  hccoae  evellable  vhlch  showed 
promise  for  use  In  airborne  applications.  The  ASR  system  selected  for  the  Wessex  trials 
was  an  SR-128  aanufacturad  by  Marconi  Secure  Radio  Systems.  This  was  a  stand-alone 
device  suitable  for  mounting  in  a  19  Inch  rack.  It  was  powered  from  the  IIS  V  400  Hz 
instrumentation  Inverter  to  allow  it  to  be  used  In  the  helicopter  from  start-up  to 
shutdown.  Speech  Input  was  via  a  600  ohm  balanced  line.  Recognition  and  control  data 
were  passed  on  RS  232  serial  Unas.  A  photograph  of  the  equipment  is  shown  in  Fig  3. 

The  main  features  of  tha  SR-128  were  as  follows: 

Speaker  trained . 

Connected  word  input 

Vocabulary  size  of  240  words  maximum 

Memory  for  128  a  of  speech  utterances 

Syntax  programming 

Ambient  noise  compensation  (noise  mask) 

RS  232  interfaces 

Self  contained  mini -cassette  handler  for  program  and 

voice  template  loading. 

B.l  Training 

The  SR-128  used  an  acoustic  pattern  matching  technique  to  recognise  words  or 
phrases.  This  was  based  upon  the  principle  that  for  a  single  speaker  the  acoustic 
signature  of  a  particular  word  would  be  similar  to  a  pre-recorded  version  of  the  same 
word  (the  voice  template).  The  user  was  therefore  required  to  'train'  the  ASR  by 
recording  the  Intended  recognition  vocabulary.  Thus  the  SR-128  can  be  classified  as  a 
speaker  dependent  system,  with  any  intending  user  preparing  voice  templates  for 
subsequent  use  of  the  equipment. 

For  the  Wessex  trials  the  templates  were  prepared  In  the  helicopter,  on  the 
ground,  without  engines  running.  The  pilot  used  the  headset  and  microphone  he  Intended 
to  use  In  flight.  This  ensured  that  the  recorded  templates  were  representative  of 
flight.  The  headset  gave  the  pilot  the  audio  feedback  which  ha  would  receive  In  flight. 
This  was  an  essential  eleaent  of  tha  training  since  varying  the  feedback  volume  tended 
to  modulate  tha  power  of  the  users  utterances.  Once  training  was  complete  the  templates 
were  stored  on  tape.  Prior  to  flight,  the  ASR  prograane  plus  the  voice  template  were 
loaded.  This  took  2-3  aln. 

B.2  Connected  speech  recognition 

Currant  ASR  systems  can  ba  categorised  by  three  methods  of  operation:  Isolated, 
connected,  and  continuous  word  recognition. 

Isolated  word  recognlsers  can  only  match  single  words  one  at  a  time,  with  periods 
of  silence  between  aach  word.  The  user  has  to  speak  In  an  unnatural  staccato  manner. 


24 


Appendix  B 


Connected  word  recognlcera  ere  able  to  natch  uneda  or  phraaea  naturally  apoken. 
However,  the  recognition  proceaa  only  connencea  when  a  period  of  alienee  la  detected, 
le  the  apeaker  atopa  talking.  There  la  therefore  a  tine  delay  between  the  apeech  and 
recognition  occurring.  Thla  Increaaea  with  longer  atrlnga  of  worda. 

Contlnuoua  apeech  recognlaera  are  able  to  natch  worda  or  phraaea  aa  they  are 
apoken  without  waiting  for  the  apeaker  to  peuae. 

Connected  and  contlnuoua  apeech  recognition  la  conalderably  nore  difficult  than 
laolatad  word  recognition  becauae: 

(a)  The  racognlaer  haa  to  chooae  fron  a  aich  larger  range  of  optlona. 

(b)  Co-artlculatloo  will  Introduce  varlatlona  In  Individual  worda. 

and  (c)  The  aound  of  a  word  trill  vary  depending  upon  Ita  proaodlc  atatua  In  a  phrase 
or  aentence. 

The  SR-128  la  a  connected  word  recognlaer  capable  of  notching  a  burst  of  speech  of 
up  to  8  s  In  duration. 

B.3  Vocabulary 

The  naxlaua  nunber  of  words  or  phrases  that  the  SR-128  can  handle  la  240 .  Table  1 
shows  Che  latest  set  of  words  used  In  the  Wessex  systen.  It  should  be  noted  Chat 
together  with  the  vocabulary  size  specification,  two  other  paranecers  were  critical. 

The  first  was  Che  naxlnun  length  of  tine  allowed  for  the  voice  tenplaces  (128  a)  and 
Che  second  wta  the  naxlnun  duration  of  an  Individual  word  or  phrase  (2  a  for  this 
cqulpnenc). 

8.4  Syntax 

Acoustic  psCCem  natchlng  la  a  very  powerful  technique  with  United  vocabulary 
sixes  20-40  words).  However,  as  the  size  of  the  vocabulary  Increases  there  la  a 
greater  probability  of  words  or  phrases  being  confused  and  errors  occurring.  In 
addition,  acoustic  pattern  natchlng  of  larger  vocabularies  takes  longer.  This  Is 
particularly  so  for  connected  and  continuous  word  recognlaera  which  have  to  cope  with 
nany  pemutaclona  of  word  atrlnga,  Onder  such  conditions,  recognition  perfomance  will 
depend  heavily  on  how  close  Che  apeafcars  nomal’  utterances  resenble  his  recorded 
cenplates. 

To  alleviate  Che  problen  of  vocabulary  size  Che  ASR  could  be  progranned  to 
restrict  the  nunber  of  active  worda  at  any  ana  cine  by  applying  graaoatlcal  rules  known 
as  syntax.  Keywords  provided  acceae  to  different  branches  of  the  syntax  tree.  For 
axanpla,  the  keyword  'checks'  would  enable  the  work  'take-off  to  be  activated.  The 
S3mtaz  tree  currently  used  In  Che  Wessex  la  shown  In  Table  2. 

B.}  Holse  conpenaatlon 

In  noise  envlronacncs  such  aa  Che  helicopter,  low  energy  apeech  features  could  be 
conplately  aasked  by  noise.  In  addition,  the  noise  presence  could  significantly  effect 
the  uny  the  pilot  speaks.  As  the  recognition  process  depended  entirely  on  pattern 


i 


I'M  FS(B)  63/ 


TM  FS(B)  637 


Appeadlx  B 


25 


macchiag  techniques ,  background  noise  could  slgnlf Icancly  affect  recognition 
performance. 

The  SR'I28  used  a  noise  mask  technique  to  reduce  the  effect  of  the  background 
noise*  The  ASR  was  given  a  noise  sample  via  the  pilot's  open  boom  microphone.  The 
power  spectrum  of  the  noise  was  detected  by  a  19  channel  filter  of  the  input  to  the 
recogniser.  The  output  of  the  filter  was  called  the  noise  mask.  With  the  ASR  in 
recognise  otode  the  noise  mask  was  used  in  the  recognition  algorithms  to  compensate  for 
Che  noise  superimposed  on  the  pilots  speech. 

Thus  there  were  three  distinct  actions  to  complete  before  the  speech  recogniser 
was  ready  for  use*  Firstly  load  the  programme »  next  load  the  operator's  voice  template 
and  lastly  load  the  noise  mask. 

B«6  Interfaces 

As  words  or  phrases  were  recognised  the  SR~128  provided  a  stream  of  data  on  an 
RS  232-C  serial  interface*  This  stream  of  data  consisted  of  a  start  of  data  code,  a 
three  digit  template  code  number^  and  an  end  of  data  code.  ASCII  characters  represent* 
ing  the  actual  word  or  phrase  and  a  template  score  code  were  also  transmitted. 

A  similar  Interface  was  used  to  prepare  the  vocabulary  and  syntax*  This  was  also 

used  to  control  the  operation  of  the  ASR  from  the  cockpit  in  flight »  using  the 

'experimental'  section  of  the  CDU* 

B.7  Pilot  control 

Speech  input  from  the  pilots  microphone  was  buffered  at  the  ASR*  A  rocker  switch 
on  the  pilots  cyclic  control  column  (previously  used  as  the  intercom  override)  was  used 
CO  activate  the  pre-amplifier  and  ASR.  This  prevented  the  DVI  system  from  attempting  to 
recognise  all  of  the  pilots  speech  as  commands,  and  to  receive  only  those  words  intended 
for  it. 


App«ndlx  C 


TRIALS  PROGRAMME  jUTO  PROCEDDRES 

Once  Che  cockpit  had  been  coaBlaaloned  and  Che  oajor  hardware  and  software  bugs 
had  been  Ironed  out.  a  period  of  Incense  flying  was  conducted  Co  refine  and  opclnlse  Che 
use  of  the  sysCea.  The  ala  was  Co  ease  Che  pilot's  workload  and  hence  laprove  alsslon 
effectiveness • 

Although  Individual  equlpaenC  perfocaance  was  an  laportanc  factor  and,  where 
possible,  was  aeaaured,  eaphaals  was  placed  on  discovering  how  and  where  the  new  tech¬ 
nology  could  be  applied  to  derive  the  naxlaiia  benefit, 

C.l  Pilots 

Three  Service  pilots  were  used  to  develop  evaluate  the  total  display  and  flight 
aanageaent  ayscea.  This  enabled  pilot  opinion  expertise,  and  a  knowledge  of  operational 
probleaa  froa  a  wide  background  to  be  Input  Into  the  trials.  As  each  pilot  had  to 
attain  and  aalntaln  a  good  level  of  understanding  and  proficiency  with  the  syatea.  It 
was  thought  chat  further  pilots  would  have  diluted  the  aaounc  of  useful  flying  chat 
could  be  achieved.  It  should  be  noted  however,  that  over  80  pilots  froa  the  Services, 
Civil  operations  and  OK  Industry  have  flown  In  Che  Heasex  and  received  deaonscratlons  of 
the  equlpaenc  and  Ideas  In  operation.  Their  coaaents  have  been  taken  Into  account 
during  syatea  developaenc  and  evaluation. 

Froa  Che  Initial  conception  of  the  prograaae  It  was  appreciated  chat  Che  flight 
aanageaent  and  display  syatea  could  have  an  lapact  on  both  aillcary  and  civil  operations. 
Hence  tasks  were  given  to  the  pilots  to  cover  both  types  of  operations,  such  as  general 
aanoeuvrea  In  visual  flight,  however,  others  such  as  nap  of  the  earth  (NOE)  flight  (low 
level  flight  using  the  aaxlaua  cover  afforded  by  natural  and  asn-aade  obstructions), 
were  obviously  alaed  at  the  Services.  The  list  below  gives  an  Indication  of  the  Casks 
used  and  Che  aaln  reasona  for  adopting  thea: 

(a)  Low  level  navigation  by  day  In  VMC 

-  navigation  accuracy  during  NOE  flight 

-  hands  on  control  of  cockpit  aanageaent  syscea 

-  benefits  of  speech  recognition 

-  benefits  of  synthetic  speech 

-  benefits  of  asp  presentations 

-  esble/chreac  warnings  display  logic 

-  use  of  Che  chln-up  CDU  during  a  prlaarlly  head.-ouc  operation 

-  overall  syscea  effectiveness 

(b)  Low  level  navigation  at  night 

-  all  Iceas  listed  In  (a) 

-  night  vision  goggle  coapstablllcy  with  digital  naps 

-  prlnary  flight  display  fomac  developaenc 

-  overall  syscea  assessaenc  during  a  very  denandlng  cask 


Appendix  C 


27 


(c)  Madlua  level  navigaclon  la  vleual  end  Inecruaent  flight 

-  aevlgatlooel  accuracy  coapered  to  teak  (a) 

-  parforaance  prediction,  fuel,  aanageaent,  endurance  nonltorlng 

-  radio  aanagaaent  In  aultl  tadlo/frequency  areat  (e^  London  Control  Zone). 

-  prlaary  flight  dleplay  davalopaent 

-  aultl  function  dlaplay  fomat  davalopaent 

-  CDO  aenu  atructure 

(d)  Lou  epeed  aenoeuvraa 

-  hover  dlaplay  fomat 

-  lou  epeed  envelope  prediction 

-  digital  aap  targeting 

-  epeech  recognition  baneflta 

-  aynchetlc  apeaeh  baneflta 

(a)  Temlnal/approach  guidance 

-  prlaary  flight  dleplay  davalopaent 

-  aynthetlc  epeech  during  Inatruaent  flight 

-  uaa  of  digital  aepa  for  lelf-poaltlonlng  taaka 

-  MLS  and  ILS  fomata 

-  tpeech  recognition  beneflta 

(f)  General  flylng/creu  training 

-  developoent  of  loglc/artlflclal  Intelllgenee  to  algnal  aalfunctlona 
to  the  pilot. 


Table  1 


SPEECH  BECOCNITIOH  TOCABOLARY 


ok 

01234S6789 

taka  off 

land 

thutdovn 

reaction 

englnea 

rotor  and  oil 

ILS 

MLS 

go  around 

approach  heading 

daclalon  height 

crulac  height 

crulae  epeed 

glldepath 

plot 

baae 

ateer 

■arker 

update 

route  aelect 

route  alpha 

route  bravo 

route  Charlie 

aurvey 


■enu 

■ap  up 
hover 

taea  and  peee 

uaypolttt 

tactical 

atatua 

checka 

llalta 

guidance 

navigation 

rubout 

flight 

ppl 

data 

preaet  I 
preaet  2 
preaet  3 
preaet  4 
boa  1 
boa  2 
Icing 

Bedford  tower 
Bedford  approach 
Bedford  radar 


^C9  (fl)Si  HL 


29 


^  m  r"“ 

w  w  d 

III  o 


**  9  e  • 

&  O  'MU 

00  a  *0  u  0^  0  **4 


w  l««ooa'auA«o**4  u  d  o 
«  a*ai-^  « 

00  e<*4l  >eLa>««i>iuo««^a«.a 


«0  u 
X  >^  C 

«  a» 

4J  M 

«  <0  b 

•?  c  3 


I  •«  «0  b  >0  b 

I  b  b  «  b  a 

»00’00'0«*<M  o 

I  IM  IM  (0  *M  b 

l*0'<0b‘T3«MM0v  A 

»o»«  vaooixa 
iflOflO  Auoofloooee 


3 

«-  «n 

WWW 

1  «i  III 

S 

>4 

U  1 

e  a  ; 

1 1 

g 

•b  ^  ^  S  5 
a  1  1  1 

M 

s» 

s 

•s. 

\ 

9  « 

•  .fib  O.'O 

I  a  b»^^bex«e 
i5  6«b«o  a  b 

i  a  5  •««  u  4  a'b  «  a  o  a 

ia‘QbO«Q*bO«<l««b  o 

I  «  a.bb*0«b*Wb«  • 

I  o.  o.  >  «o  XU  o.«b  uaa  eAc/>a«.Ob 

'  J  i9  ^  ^  <s  oibbQejiJi  aa 

>ZZXH3H<CdOUUOSMO«Q 


I  «  s  M 
»  <0  a  ^  8  'b 
^  b  O  **4  I  > 

■  u  '2  «  a  o  « 

V  fi  ^  z  (a  u  X 


Sb  #*t  «4  g 

e  «  «M  a 

a.  o  'b  o  •  o  O  * 
a’aba«o«b^  ‘a< 

41  Oi  b  «|  Of  *a  b 

aa>«oxuo>^ea 

2aO 

XSH3HUH«bV)l 


00  I  a  ^  b  ou 

>b  b|  V 

>  ’bi  a.  a  > 

liS 


<b  g  b  ^ 

b  5  S  <«  b 

400.  •bcjb  a» 

oo’vabOiO'bx  oa 

<•4  01  (b  b  00  <0  b 

>  eka>«xu*bMb^  ^10 

<0  ««o  Mio^xaixab 

Z  ZZSHZHXXOOOZVI 


b  a  C  (0  b 

,  _  .  ,  c«  a  o  ‘b  u  a 

loa  eo  a*abo«o*b  o 
atilhbb  «  ab 

e*bB  >>b  Q.a>iaxu9^j3 

^555  53  225».53i5g 


a  ^  b  o  «b 

01  «  Oi  b 

a  a  >  Of  X  u 


I  AO  W 

3  00  3*abxa*b  o 

>b  b  Of  Oi  b 

>>b  aa>ioxu9«u)^ 

O  (QM  <a<00  4(0I3X 

U  ZW  ZZXH3HO00O 


30 


T«bl«  3 

TOTAL  SPEECH  ODTPOT  VOCABOLARY 


torque 

pull  up 

good  nornlng 

good  evening 

hello 

goodbye 

ectlveted 

ok 

C«a 

twenty 

thirty 

forty 

fifty 

elxty 

eeventy 

eighty 

ninety 

one  hundred 

one  two  five 

one  fifty 

two  hundred 

two  fifty 

1  hundred  4  2S 

1  hundred  4  50 

2  hundred  &  SO 

five  hundred 

thouaend 

hundred 

0Q« 

two 

three 

four 

five 

llx 

•oven 

eight 

nine 

Mro 

low 

high 

deceleretlon 

decleion  height 

glldepeth 

radio 

beroaetrlc 

look  up 

height 

rang* 

check  port  T4 

check#  coaplete 

check  etbd  T4 

Nr  high 

Hr  low 

wemlng 

check  port  conpreteor 

eogiaas 

elert 

et tent Ion 

check  etbd  conpreeeor 

attention 

check  fuel 

fuel  oleaetch 

T4  oisBetch 

Hg  nlanatch 

low  fuel 

check  Hr 

reduce  power 

reduce  speed 

••urgency 

check 

nevtgetlon 

comm» 

guide nee 

plot 

Ttble  4 

PRIORITY  SPEECH  OOTPOT  VOCAJOLARY 


Event 

Speech  output 

power  turbine  alsaatch  *  (needle  split) 

warning  engines 

stbd  engine  tenperature  >  Halt 

check  stbd  T4 

port  coapressor  rpa  >  Halt 

check  port  coapressor 

stbd  coapressor  rpa  >  Halt 

check  stbd  coapressor 

fuel  flow  alsaatch  >  Halt 

warning  fuel  alsaatch 

fuel  tank  contents  alsaatch  >  Halt 

check  fuel 

fuel  In  either  tank  <  Halt 

warning  check  fuel 

rotor  speed  out  of  Halts 

warning  check  Nr 

torque  >  Halt 

torque 

cable  detected 

warning 

radar  datacted 

warning 

airspeed  >  asxlaua  allowed  (Vwx) 

roduco  speed 

engine  teaperature  alsaatch 

warning  T4  alsaatch 

coapressor  alsaatch 

warning  coapressor  alsaatch 

iti  ySiB)  6J/ 


T«bl«  5 

OPTIOmi.  SPEECH  OOTPUT  VDCABOIARY 


Evans 

Spaach  output 

if  halght  <  2S2  and  halght  >  248 

two  fifty 

if  halghe  <  202  and  halght  >  198 

cvo  tauodr«d 

it  halght  <  1S2  and  halght  >  148 

ona  fifty 

If  halght  <  102  and  halght  >  98 

cm*  hundred 

If  halght  <  92  and  halght  >  88 

nlnaey 

If  halght  <  82  and  halght  >  78 

eighty 

If  halght  <  72  and  halghe  >  68 

Mvnnty 

If  halghe  <  62  and  halghe  >  38 

alaty 

If  halghe  <  32  and  halght  >  48 

fifty 

If  halghe  <  42  and  halghe  >  38 

forty 

If  halghe  <  32  and  halghe  >  28 

thirty 

If  halghe  <  daclalon  halghe 

declalon  height 

If  halghe  >  daalrad  halghe  on  dlractoca 

high 

If  halghe  <  daalrad  halght  on  dlractora 

low 

If  height  <  halghe  for  radale 

pull  up  pull  up 

32 

MmBMCES 

Wo.  Author  Tltlo.  ote 

1  B.  Uttlo  Tho  dovolopuanc  of  an  alactroolc  cockpit  dlaplay  and 

flight  nanaganant  ayataa  for  halicoptara. 

Taehnteal  Maaorandun  n(B)  S86  (1985) 


FMq 


rig  C.DU  menu  page 


A 


TM  FSfKl 


Fig  4  Noise  masks  in  different  flight  modes  (mean  values) 


TM  FS(B)  637 


F1g  7  Confusion  matrix  for  digits 


Fig  8  Word  usage  on  DVI 


I  0  0^  v^TT  feCAMCC  ^ 


*3- 


•. .  '..i  t»>j  '  -A  i'f'  4fSrv!^'i4® 


N '  Vx  *  ' 


’-*.V,J( 


END 


DATE 

FILMED 


